AITopics | reproducible research

Collaborating Authors

reproducible research

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature

Alpay, Faruk, Kilictas, Bugra, Alakkad, Hamdi

arXiv.org Artificial IntelligenceAug-7-2025

The number of publications on generative modelling has grown exponentially over the last decade, with dozens of new papers on large language models and autoregressive (AR) techniques appearing each week. This deluge renders manual literature reviews impractical and hampers reproducibility. Systematic literature review (SLR) pipelines such as PROMPTHEUS (Torres et al., 2024) and modular summarisation frameworks (Achkar et al., 2024) have shown that automation can reduce the burden on researchers; however, they are domain-agnostic and often separate extraction from experimental validation. Our goal is to advance this line of work by delivering a fully integrated pipeline focused on AR models that not only summarises research but also extracts the hyperparameters, architectures, and metrics needed to reproduce experiments. The challenges motivating our work are threefold. First, the "literature overload" problem means that even experts struggle to keep up with emergent models and techniques. Second, reproducibility remains an open concern in machine learning: a lack of transparent reporting of code and hyperparameters has led to irreproducible claims (Kapoor and Narayanan, 2022). Initiatives such as the NeurIPS reproducibility checklist encourage authors to document training settings and datasets (Pineau et al., 2021), yet many papers still omit critical information. Third, AR models themselves are evolving rapidly, from recurrent architectures such as LSTMs (Merity et al., 2017; Bengio et al., 2003) to Transformer-based systems (Vaswani et al., 2017) and emerging large language models (Touvron et al., 2023).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.04612

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training

Khan, Tanveer, Budzys, Mindaugas, Nguyen, Khoa, Michalas, Antonis

arXiv.org Artificial IntelligenceMar-6-2024

Machine Learning (ML), addresses a multitude of complex issues in multiple disciplines, including social sciences, finance, and medical research. ML models require substantial computing power and are only as powerful as the data utilized. Due to high computational cost of ML methods, data scientists frequently use Machine Learning-as-a-Service (MLaaS) to outsource computation to external servers. However, when working with private information, like financial data or health records, outsourcing the computation might result in privacy issues. Recent advances in Privacy-Preserving Techniques (PPTs) have enabled ML training and inference over protected data through the use of Privacy-Preserving Machine Learning (PPML). However, these techniques are still at a preliminary stage and their application in real-world situations is demanding. In order to comprehend discrepancy between theoretical research suggestions and actual applications, this work examines the past and present of PPML, focusing on Homomorphic Encryption (HE) and Secure Multi-party Computation (SMPC) applied to ML. This work primarily focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance. We provide a solid theoretical background that eases the understanding of current approaches and their limitations. In addition, we present a SoK of the most recent PPML frameworks for model training and provide a comprehensive comparison in terms of the unique properties and performances on standard benchmarks. Also, we reproduce the results for some of the papers and examine at what level existing works in the field provide support for open science. We believe our work serves as a valuable contribution by raising awareness about the current gap between theoretical advancements and real-world applications in PPML, specifically regarding open-source availability, reproducibility, and usability.

computation, implementation, protocol, (13 more...)

arXiv.org Artificial Intelligence

2403.03592

Country:

Europe > Finland > Pirkanmaa > Tampere (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Teaching reproducible research for medical students and postgraduate pharmaceutical scientists

Meid, Andreas D.

arXiv.org Machine LearningDec-7-2020

In many academic settings, medical students start their scientific work already during their studies. Like at our institution, they often work in interdisciplinary teams with more or less experienced (postgraduate) researchers of pharmaceutical sciences, natural sciences in general, or biostatistics. All of them should be taught good research practices as an integral part of their education, especially in terms of statistical analysis. This includes reproducibility as a central aspect of modern research. Acknowledging that even educators might be unfamiliar with necessary aspects of a perfectly reproducible workflow, I agreed to give a lecture series on reproducible research (RR) for medical students and postgraduate pharmacists involved in several areas of clinical research. Thus, I designed a piloting lecture series to highlight definitions of RR, reasons for RR, potential merits of RR, and ways to work accordingly. In trying to actually reproduce a published analysis, I encountered several practical obstacles. In this article, I focus on this working example to emphasize the manifold facets of RR, to provide possible explanations and solutions, and argue that harmonized curricula for (quantitative) clinical researchers should include RR principles. I therefore hope these experiences are helpful to raise awareness among educators and students. RR working habits are not only beneficial for ourselves or our students, but also for other researchers within an institution, for scientific partners, for the scientific community, and eventually for the public profiting from research findings.

reproducibility, reproducible research, student and postgraduate pharmaceutical scientist, (13 more...)

arXiv.org Machine Learning

2012.03554

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.05)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.89)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Curriculum > Subject-Specific Education > Professional (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Software > Programming Languages (0.69)

Add feedback

Allen Institute open-sources AllenAct, a framework for research in embodied AI

#artificialintelligenceAug-31-2020, 23:35:12 GMT

Researchers at the Allen Institute for AI today launched AllenAct, a platform intended to promote reproducible research in embodied AI with a focus on modularity and flexibility. AllenAct, which is available in beta, supports multiple training environments and algorithms with tutorials, pretrained models, and out-of-the-box real-time visualizations. Embodied AI, the AI subdomain concerning systems that learn to complete tasks through environmental interactions, has experienced substantial growth. The Allen Institute argues that this growth has been mostly beneficial, but it takes issue with the fragmented nature of embodied AI development tools, which it says discourages good science. In a recent analysis, the Allen Institute found that the number of embodied AI papers now exceeds 160 (up from around 20 in 2018 and 60 in 2019) and that the number of environments, tasks, modalities, and algorithms varies widely among them.

allenact, machine learning, reinforcement learning, (9 more...)

#artificialintelligence

Genre: Instructional Material (0.37)

Industry: Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

What is Academic Torrents and Where is Data Sharing Going?

#artificialintelligenceNov-10-2016, 09:50:23 GMT

Academic Torrents is a platform for researchers to share data. It consists of two pieces: a site where users can search for datasets, and a BitTorrent backbone which makes sharing data scalable and fast. The goal is to facilitate the sharing of datasets amongst researchers. It was created by the Institute for Reproducible Research (a U.S. 501(c)3 non-profit). The site provides access to over 15TB of data including popular machine learning datasets such as all of UCI, Imagenet, and Wikipedia.

academic torrent, artificial intelligence, machine learning, (9 more...)

#artificialintelligence

Country: North America > Canada > Quebec > Montreal (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback